Templated COSA command execution #1739

darkmuggle · 2020-09-22T23:05:01Z

This provides template support for COSA commands with the goal of having
COSA understand the RHCOS jobspec. The goal of this work is to provide
templated commands. For example:
cosa entry --spec rhcos.yaml run cosa init "{{ .Recipe.GitUrl}}"

Or:
cosa entry --spec rhcos.yaml runSteps --steps rhcos.steps

Where rhcos.steps has:

    cosa init "{{ .Recipe.GitUrl }}"
    cosa fetch
    cosa build

The way I envision this being implements is via a container where the JobSpec is provided via ConfigMap:

FROM quay.io/coreos-assembler/coreos-assembler:latest
ENTRYPONT ["/usr/lib/coreos/entry"]

darkmuggle · 2020-09-22T23:08:08Z

Throwing this up as an early WIP to show the idea and see if there any interest in pursuing this. This would give us a way to translate the RHCOS Pipeline commands into GoLang templates for command execution and allow for easy migration. The RHCOS job spec isn't particularly loved per se, but its in production use for the ART and Development pipelines.

The other thought I had was replacing dumb-init with this "entry" based on go-init. My thinking is that by having a custom entry point, we could make COSA an OCP Custom BuildConfig image (e.g. oc start-build bc/cosa -env SPEC_URL=https://... )

darkmuggle · 2020-09-22T23:14:11Z

The reason I went with GoLang over Python for this piece of code was because GoLang is way nicer about cleaning up processes and allowing concurrency for parallel tasks is trivial. I created it as it own thing instead of putting into Mantle directly since it doesn't fit neatly there.

jlebon · 2020-09-23T15:57:31Z

Would it make sense to just make the steps file a shell script instead? So essentially, cosa entry would just render the script using the provided spec, and then os.exec into it.

darkmuggle · 2020-09-23T21:54:05Z

Would it make sense to just make the steps file a shell script instead? So essentially, cosa entry would just render the script using the provided spec, and then os.exec into it.

I thought about using os.exec instead of bringing in a fork of go-init but:

COSA has a history of defunct/zombie's hanging around. We have most of those handled, so this not a compelling reasons in and of itself.
The ability to use the templates from the CLI is compelling, especially for doing tests and running commands that are normally run via the pipeline on the CLI.
We have a concurrency-safe place to corrdinate parallel steps (feature not implemented) and ensure that defunct/zombie's are reaped.

Also, since this doesn't have terminal emulation, its not suitable for interactive developer hack-benches. To be sure, I can see developers entering cosa and running entry run-steps hack.steps.

cgwalters · 2020-09-23T22:57:55Z

OK a whole lot going on here.
This seems like it touches on #1668 - it'd be really nice notably to not have two vendor/ directories at least.

We have a concurrency-safe place to corrdinate parallel steps (feature not implemented) and ensure that defunct/zombie's are reaped.

There are a lot of "execute these things in parallel" languages, among them the venerable make to ninja etc. I think we need to grow some concept of dependencies, and it'd be better to reuse an existing tool that has all the niceties like topological sorting, a CLI that displays progress, integration with a jobserver etc. WDYT then about we have cosa build --spec jobspec.yaml that then turns around and generates e.g. a ninja file which then in turn re-executes some of the cosa commands, but honoring dependencies etc.?

darkmuggle · 2020-09-23T23:16:31Z

This seems like it touches on #1668 - it'd be really nice notably to not have two vendor/ directories at least.

Partly. The base idea came before #1688; the code idea never saw the light of day, though. I went ahead and dusted this off because I am angling to have COSA run as a build containers via OCP buildconfig and saw that I needed COSA to run more than a single command at once.

👍 on having vendor/ be common.

WDYT then about we have cosa build --spec jobspec.yaml that then turns around and generates e.g. a ninja file which then in turn re-executes some of the cosa commands, but honoring dependencies etc.?

I haven't really looked at Ninja, tbh. I can look. But I was avoiding cosa build specifically to avoid teaching it about the spec or even supporting variant specs.

darkmuggle · 2020-09-24T15:44:56Z

Okay, after looking at #1739 (comment), I think that this approach is still valid. Using existing tools is fine, but ninja and Makefiles really doesn't solve the problem that I was tackling. And this really goes back why coreos/enhancements#1 was filed (and closed because the idea is aweful) -- we need a file-based interface into COSA that is not a CLI.

The genesis to this PR comes from the realization that the real difference between the RHCOS and FCOS pipelines is the business logic. And when I looked deeply at the domain, it becomes clear that the bulk of the Jenkins logic amounts to rendering configuration into CLI arguments and setting up the environment. My position is that giving COSA a file interface will simplify the Jenkins pipelines significantly.

I think we need to grow some concept of dependencies, and it'd be better to reuse an existing tool that has all the niceties like topological sorting, a CLI that displays progress, integration with a jobserver etc.

I agree with understanding dependencies, and I've done some work in that regard. But, I think dependency tracking is orthogonal to this idea; a dependency tree may negate the need to run "stages" but that doesn't negate the sugar of having templating.

Integration with another tool would be a big undertaking, especially if we want a pretty CLI experience. However, integration for the Developer use-case might look different than integration for production use-cases (e.g. Jenkins, FedMsg, OpenShift pods).

jlebon · 2020-09-25T14:06:08Z

The other thought I had was replacing dumb-init with this "entry" based on go-init. My thinking is that by having a custom entry point, we could make COSA an OCP Custom BuildConfig image (e.g. oc start-build bc/cosa -env SPEC_URL=https://... )

I thought about using os.exec instead of bringing in a fork of go-init but:

Hmm, I'm still not sure I follow why we need to fold in go-init. It's only relevant when running cosa as a container, and we handle it there by wrapping the entrypoint with dumb-init, right? E.g. in my pet container workflow, I already have catatonit running as pid 1, so I wouldn't need cosa to have its own. (And ideally eventually this will be built into k8s.)

The genesis to this PR comes from the realization that the real difference between the RHCOS and FCOS pipelines is the business logic. And when I looked deeply at the domain, it becomes clear that the bulk of the Jenkins logic amounts to rendering configuration into CLI arguments and setting up the environment. My position is that giving COSA a file interface will simplify the Jenkins pipelines significantly.

Can you sketch out a bit more what the Jenkins pipeline would look like? It's worth noting that this is introducing two file interfaces: the spec file and the template file. I agree that a spec file would be useful (WDYT about calling this e.g. a property file instead so we don't confuse it with RPM spec files?), and agree that the differences between the pipelines mostly lie in the business logic, but I'm having trouble reconciling how templates fit into the current flow. IIUC, doesn't it imply that whenever we want to do something at the Jenkins layer (e.g. setting the job description, marking a new stage, spawning k8s worker pods for parallel work, starting other jobs, etc...) we would have to keep going in and out of template scripts? Or would all that stuff be obsolete in this new world?

darkmuggle · 2020-09-25T17:24:43Z

Hmm, I'm still not sure I follow why we need to fold in go-init. It's only relevant when running cosa as a container, and we handle it there by wrapping the entrypoint with dumb-init, right? E.g. in my pet container workflow, I already have catatonit running as pid 1, so I wouldn't need cosa to have its own. (And ideally eventually this will be built into k8s.)

I'm backing away from the idea of replacing dumb-init with entry as the default. Rather it would be used in the CI/Pod scenarios. Pulling in the go-init was done to handle cleanup of zombies that I have seen in my own pet container.

Can you sketch out a bit more what the Jenkins pipeline would look like? It's worth noting that this is introducing two file
interfaces: the spec file and the template file.

The world that I envision is COSA-as-a-Service where COSA is an OpenShift BuildConfig. Rather than relying on Jenkins to setup the pod, Jenkins would execute oc start-build bc/cosa --env.... --follow=true.

Why? Right now the RHCOS pipelines are greedy. The entire pipeline consumes a lot of RAM and a lot of disk space for a build and is scheduled on the same pod. In the RHCOS pipeline, we require privileges, which requires to play funny games to get /dev/kvm) and the whole pipeline runs as privileged; this is unneeded and schedules all our workloads on one pod. We could break it up and get the pods scheduled where needed. But in all reality using Jenkins podTemplates is inefficient here since each new pod gets a new agent container too. The other problem is that we need to have a Jenkins container and know what that container is, in all environments that we want to launch a pod. In the RHCOS world, we have found Agent's are a bit finicky and timeout, lose their connection, and because all pipelines are resource spendy, they get evicted.

Now, why templating when OpenShift has templates? One of the lessons learned in the RHCOS pipelines was that we ended up polluting the namespaces with essentially the same thing with minor variations. The idea of using templates is to re-use the same buildconfigs based on versions (ala bc/cosa-{4.[3-6], latest} and bc/cosa-priv-{4.[3-6, latest}).

The reason that envVar's don't work well here is Jenkins right now set the envVars. And don't get me started on truthiness in envVar strings. The RHCOS pipeline's complicated nature comes from the processing of the envVar's. OpenShift templates are a real PITA to update when new options are needed. I originally came up with the idea of having COSA support envVar's for CLI commands; that idea excited people about as much as root canal. When when I looked at the BuildConfig spec, you don't have the CMD and ENTRYPOINT interfaces. The only possible way to have COSA as a buildConfig is to have an entrypoint with an interface for files and/or the buildConfig spec.

The other big reason I wanted to move away from Jenkins as the pod scheduler is because of multi-arch. Right now we have several down-stream teams that take the RHCOS pipeline and then standup Jenkins and run parallel pipelines. The cost of standing up builders is expensive. By having the interfaces this provides, we can run COSA anywhere we can run a container (e.g. ssh me@there 'podman run --rm --entrypoint /usr/bin/entry...).

but I'm having trouble reconciling how templates fit into the current flow.

I would not reconcile it, tbh. This is future looking. Templates give us nothing today.

Consider a distributed build system on aarch64, x86_64, and ppc64el. To start a build in a remote pod via a buildConfig, COSA needs to execute several steps.

# build.steps
cosa buildprep {{ .S3.Url }}
cosa init {{ .Recipe.GitUrl}
cosa fetch
cosa build --skip-prune
cosa buildupload s3 --acl={{ .S3.Acl} {{ .S3.Url }}

And for testing:

aws s3 get s3://{{ .S3.Url }}....
kola run-upgrade --ignition-version v2 -b rhcos...

(A post-commit hook would fetch the artifacts out)

Follow? Now, to @cgwalters's point about build dependencies (i.e why not have COSA understand the dependencies?) is that build dependencies assume that dependencies are in the same place. What if the dependency exists, but needs fetching?

we would have to keep going in and out of template scripts? Or would all that stuff be obsolete in this new world?

Not sure I follow. I think what this will look like is a Jenkins file that looks like:

       stage("build") {
           parallel x86: {
                cosa_oc("creds-x86", "build.steps")
          } aarch64: {
                cosa_oc("creds-aarch64", "builds.steps")
         }

Where build.steps is the example above and cosa_oc is a helper function to run oc start-build bc/cosa --env... --from-file=builds.steps.

Jenkins will still give us the pretty UI, we'll still get the logs.

Finally, the other angle that this gives us is something really slick for the Developer use case. One common pain-point (for those of us unlucky enough to be on Comcast/Spectrum/$EvilCable that only gets 25Mb up) is how miserable it is to upload to a cloud to test your changes, and that assumes you have all the secrets. By having COSA as a buildconfig with this interface, I want to bring a cosa remote command that launches your local {src,overrides} and /usr/lib/coreos-assembler in a remote on the OpenShift of your choosing.

Imagine:
cosa remote --cred ocp.yaml --stages build-upload-to-gcp.steps

Where build-upload-to-gcp.steps has:

cosa buildprep {{ .S3.Url }}
cosa init {{ .Recipe.GitUrl }}
cosa fetch
cosa build --skip-prune
cosa buildextend-gcp --upload....

darkmuggle · 2020-09-28T15:06:19Z

Okay, I think this is ready for review.

At the high level, I'm electing to keep calling the JobSpec the "JobSpec"; we have prior art in the existing JobSpec, and in the existing pipeline. And at the end-of-the-day, I disagree that someone will confuse an RPM spec with Jobspec. Should we incept a generic build "properties" file, we can add a "property."

Finally, as stated, this is a starting point for distributed builds.

ashcrow

The code looks good to me but I'd like to see some some info put into a markdown doc so folks who see this for the first time get a sense of how to use it and/or what it is for.

ashcrow · 2020-09-28T15:48:34Z

entrypoint/spec/rhcos_jobspec.go

@@ -0,0 +1,161 @@
+package spec
+
+type Aliyun struct {


nit: In the future I'd love to see these structs with go docs added

I did a little bit more than doc string 'em. I removed the Jenkins Specific pieces.

cgwalters

I think in general I don't understand why we aren't just templating a set -euo pipefail shell script basically. The only slightly tricky thing there is to handle escaping arguments but...ah yeah we have shellquote.Join vendored in mantle already.

BTW I created https://crates.io/crates/sh-inline to do something similar to this for Rust to use in ostree's test suite - it's not quite the same because that's more about inline whereas this seems more about having the templates maintained outside of the code, but some similar ideas.

entrypoint/cmd/entry.go

cgwalters · 2020-09-28T19:02:31Z

entrypoint/cmd/entry.go

+	// cosaDir is set during a complile setting.
+	if cosaDir == "" {
+		path, err := os.Getwd()
+		if err != nil {


Hmm...os.Getwd() can only fail I think in situations where the current directory is deleted underneath the process...why are we trying to handle that?

being pendantic

being pendantic

@darkmuggle To clarify do you mean that you're being pedantic with extra checking?

Correct, I'm being pedantic with the extra check. It doesn't hurt anything. I definitely wasn't saying that Mr. Walters was being pedantic in the review (which is one way to read my poorly formed sentence fragment.)

Thanks! To clarify I did read it the first way initially.

But my higher level uncertanty isn't checking for errors - that's always good IMO (now that I look I phrased this badly) so I more meant why are we silently falling back instead of propagating the error? A whole lot of tools out there will fail if their current working directory is deleted - usually this is a user error.

IOW why not just:

if cosaDir == { path, err := os.Getwd() if err != nil { return err }

Not a big deal obviously, just one of those detail things that can be good to have a rough consensus on since the pattern tends to reoccur elsewhere.

cgwalters · 2020-09-28T19:04:30Z

entrypoint/exec/go_init.go

+	"time"
+)
+
+func RemoveZombies(ctx context.Context, wg *sync.WaitGroup) {


I'd personally be more in favor of adding an assertion that we're not pid 1 (i.e. require that the caller has used --init or dumb-init in Kubernetes) and documenting how to do that rather than also carrying our own init. But not a blocker.

entrypoint/exec/go_init.go

cgwalters · 2020-09-28T19:07:01Z

entrypoint/exec/go_init.go

+	}
+
+	// Wait for command to exit
+	err = cmd.Wait()


Doing this "start + wait + exit" pattern I think should always raise the question of why we're not just directly exec()-ing the child process.

run-steps can run N files; we wait to see if that the command is done.

cgwalters · 2020-09-28T19:22:39Z

Forgot to add though I definitely agree with the argument that we should move some "high level steps" into cosa itself. We have a tension here between logic shared in groovy (e.g. https://github.com/coreos/coreos-ci-lib/ ) and this is effectively arguing for draining some of the groovy - which I think makes sense. It'd probably help a lot to understand this PR if e.g. it also added some steps/templates and then switched the .cci.jenkinsfile CI run to use those or so?

This provides template support for COSA commands with the goal of having COSA understand the RHCOS jobspec. The goal of this work is to provide templated commands. For example: cosa entry --spec rhcos.yaml run cosa init "{{ .Recipe.GitUrl}}" Or: cosa entry --spec rhcos.yaml run-steps rhcos.steps Where rhcos.steps has: cosa init "{{ .Recipe.GitUrl }}" cosa fetch cosa build The alternative entry is used when the envVar `COSA_CI_ENTRY=1`.

Signed-off-by: Ben Howard <[email protected]>

darkmuggle · 2020-09-29T02:05:20Z

and this is effectively arguing for draining some of the groovy - which I think makes sense.

To be blunt: this is an understatement. My core argument is that the Groovy is actually a liability on the RHCOS side. I think I make the case for what I'm arguing for and why in the added README.md -- and the vision I want to push.

ashcrow · 2020-09-29T14:17:28Z

entrypoint/README.md

@@ -0,0 +1,145 @@
+# Entrypoint


This is a fantastic write up! One nit which I do not think is worth looking at in this PR itself is I believe it would be best to have the "what and how to use" up front followed by the "why". I believe more people will be interested in what it is and how to use it before they end up asking what brought us from where we are today to this pattern.

Signed-off-by: Ben Howard <[email protected]>

jlebon · 2020-09-29T20:09:51Z

This overall looks sane to me. I still don't quite understand why this isn't just another mantle pkg so we share the same vendor directory, and also why we're bundling init functionality instead of just re-using dumb-init which is already in the container image, but we can resolve those in follow-ups.

/approve

darkmuggle · 2020-09-29T22:21:02Z

This overall looks sane to me. I still don't quite understand why this isn't just another mantle pkg so we share the same vendor directory, and also why we're bundling init functionality instead of just re-using dumb-init which is already in the container image, but we can resolve those in follow-ups.

@jlebon I hope that the next PR will help to answer the init function question.

Regarding making this another Mantle command, I did consider it -- and I even considered a shared vendor directory. There is some disagreement about the vendoring approach.

Mantle has a clear purpose in code in doing $THING with an artifact (run, test, publish), while this new command is more meta about running COSA itself. And I did ask two other people about whether I should put this in Mantle; the consensus was no.

cgwalters

/lgtm
This can't really break anything right now, so I'm fine moving forward and addressing more things as followups. In particular I think the time where we try to replace parts of the pipeline should be revealing.

openshift-ci-robot · 2020-09-30T14:47:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow, cgwalters, darkmuggle, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ashcrow,cgwalters,darkmuggle,jlebon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot added the do-not-merge/work-in-progress label Sep 22, 2020

openshift-ci-robot requested review from arithx and bgilbert September 22, 2020 23:05

openshift-ci-robot added the approved label Sep 22, 2020

darkmuggle requested review from ashcrow and ravanelli September 22, 2020 23:14

darkmuggle added design design discussion ok-to-bikeshed Permission is granted to bikeshed labels Sep 23, 2020

darkmuggle mentioned this pull request Sep 24, 2020

House-keeping: house GoLang code under go/src #1743

Closed

Entrypoint: add intial vendored modules

87f2b43

darkmuggle changed the title ~~WIP: Templated COSA command execution~~ Templated COSA command execution Sep 28, 2020

openshift-ci-robot removed the do-not-merge/work-in-progress label Sep 28, 2020

ashcrow reviewed Sep 28, 2020

View reviewed changes

cgwalters reviewed Sep 28, 2020

View reviewed changes

Ben Howard added 3 commits September 28, 2020 19:28

entry/README.md: add readme to explain purpose of entry

0f4d4f5

entry: run test and fmt checks on build

f54a9fc

Signed-off-by: Ben Howard <[email protected]>

ashcrow reviewed Sep 29, 2020

View reviewed changes

ashcrow approved these changes Sep 29, 2020

View reviewed changes

entry/README.md: PR updates

6e7cebf

Signed-off-by: Ben Howard <[email protected]>

cgwalters reviewed Sep 30, 2020

View reviewed changes

openshift-ci-robot assigned cgwalters Sep 30, 2020

openshift-ci-robot added the lgtm label Sep 30, 2020

openshift-merge-robot merged commit fe9ae78 into coreos:master Sep 30, 2020

darkmuggle deleted the pr/cosa-batch-template branch March 9, 2021 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Templated COSA command execution #1739

Templated COSA command execution #1739

darkmuggle commented Sep 22, 2020 •

edited

Loading

darkmuggle commented Sep 22, 2020 •

edited

Loading

darkmuggle commented Sep 22, 2020

jlebon commented Sep 23, 2020

darkmuggle commented Sep 23, 2020 •

edited

Loading

cgwalters commented Sep 23, 2020

darkmuggle commented Sep 23, 2020 •

edited

Loading

darkmuggle commented Sep 24, 2020

jlebon commented Sep 25, 2020

darkmuggle commented Sep 25, 2020 •

edited

Loading

darkmuggle commented Sep 28, 2020

ashcrow left a comment

ashcrow Sep 28, 2020

darkmuggle Sep 28, 2020

cgwalters left a comment •

edited

Loading

cgwalters Sep 28, 2020

darkmuggle Sep 28, 2020

ashcrow Sep 29, 2020

darkmuggle Sep 29, 2020

cgwalters Sep 30, 2020

cgwalters Sep 28, 2020

cgwalters Sep 28, 2020

darkmuggle Sep 28, 2020

cgwalters commented Sep 28, 2020

darkmuggle commented Sep 29, 2020 •

edited

Loading

ashcrow Sep 29, 2020

jlebon commented Sep 29, 2020

darkmuggle commented Sep 29, 2020

cgwalters left a comment

openshift-ci-robot commented Sep 30, 2020

Templated COSA command execution #1739

Templated COSA command execution #1739

Conversation

darkmuggle commented Sep 22, 2020 • edited Loading

darkmuggle commented Sep 22, 2020 • edited Loading

darkmuggle commented Sep 22, 2020

jlebon commented Sep 23, 2020

darkmuggle commented Sep 23, 2020 • edited Loading

cgwalters commented Sep 23, 2020

darkmuggle commented Sep 23, 2020 • edited Loading

darkmuggle commented Sep 24, 2020

jlebon commented Sep 25, 2020

darkmuggle commented Sep 25, 2020 • edited Loading

darkmuggle commented Sep 28, 2020

ashcrow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgwalters left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgwalters commented Sep 28, 2020

darkmuggle commented Sep 29, 2020 • edited Loading

Choose a reason for hiding this comment

jlebon commented Sep 29, 2020

darkmuggle commented Sep 29, 2020

cgwalters left a comment

Choose a reason for hiding this comment

openshift-ci-robot commented Sep 30, 2020

darkmuggle commented Sep 22, 2020 •

edited

Loading

darkmuggle commented Sep 22, 2020 •

edited

Loading

darkmuggle commented Sep 23, 2020 •

edited

Loading

darkmuggle commented Sep 23, 2020 •

edited

Loading

darkmuggle commented Sep 25, 2020 •

edited

Loading

cgwalters left a comment •

edited

Loading

darkmuggle commented Sep 29, 2020 •

edited

Loading